Introduction

DBS is a flexible and robust clustering framework that consists of three independent modules. The first module is the parameter-free projection method Pswarm, which exploits the concepts of self-organization and emergence, game theory, swarm intelligence and symmetry considerations. The second module is a parameter-free high-dimensional data visualization technique, which generates projected points on a topographic map with hypsometric colors, called the generalized U-matrix. The third module is a clustering method with no sensitive parameters. The clustering can be verified by the visualization and vice versa. The term DBS refers to the method as a whole.

For further details, see Databionic swarm in [Thrun, 2018], chapter 8. Further examples and a comparison to 26 common clustering algorithms is provided in http://www.deepbionics.org/Projects/ClusteringAlgorithms.html. If you want to verifiy your clustering result externally, you can use Heatmap or SilhouettePlot of the CRAN package DataVisualizations.

First Example: Automatic approach

First Module: Projection of high-dimensional Data

2d projection, with instant visualization of annealing steps. DistanceMatrix hast to be defined by the user.

library(DatabionicSwarm)
## Package 'DatabionicSwarm' version 1.1.0.
## Type 'citation('DatabionicSwarm')' for citing this R package in publications.
data('Hepta')
InputDistances=as.matrix(dist(Hepta$Data))
projection=Pswarm(InputDistances)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |=======                                                          |  11%
  |                                                                       
  |==========                                                       |  16%
  |                                                                       
  |==============                                                   |  21%
  |                                                                       
  |=================                                                |  26%
  |                                                                       
  |=====================                                            |  32%
  |                                                                       
  |========================                                         |  37%
  |                                                                       
  |===========================                                      |  42%
  |                                                                       
  |===============================                                  |  47%
  |                                                                       
  |==================================                               |  53%
  |                                                                       
  |======================================                           |  58%
  |                                                                       
  |=========================================                        |  63%
  |                                                                       
  |============================================                     |  68%
  |                                                                       
  |================================================                 |  74%
  |                                                                       
  |===================================================              |  79%
  |                                                                       
  |=======================================================          |  84%
  |                                                                       
  |==========================================================       |  89%
  |                                                                       
  |==============================================================   |  95%
  |                                                                       
  |=================================================================| 100%

Second Module: Generalized Umatrix

Here the Generalized Umatrix is calculated using a simplified emergent self-organizing map algorithm. Then, the Visualizuation of Generalized Umatrix is done by a 3D landscape called topographic map with hypsometric tints. Seven valleys are shown resulting in seven main clusters. Note, that the resulting visualization will be toroidal meaning that the left borders cyclically connects to the right border (and bottom to top). This means there are no “real”" borders in this visualizations, instead the visualization is “continuous”.

library(DatabionicSwarm)
library(GeneralizedUmatrix)
## 
## Attaching package: 'GeneralizedUmatrix'
## The following object is masked from 'package:DatabionicSwarm':
## 
##     Lsun3D
library(rgl)
visualization=GeneratePswarmVisualization(Data = Hepta$Data,projection$ProjectedPoints,projection$LC)
## [1] "Initializing sESOM algorithm"
## [1] "Operator: getUmatrix4BMUs() at 8%"
## [1] "Operator: getUmatrix4BMUs() at 17%"
## [1] "Operator: getUmatrix4BMUs() at 25%"
## [1] "Operator: getUmatrix4BMUs() at 33%"
## [1] "Operator: getUmatrix4BMUs() at 42%"
## [1] "Operator: getUmatrix4BMUs() at 50%"
## [1] "Operator: getUmatrix4BMUs() at 58%"
## [1] "Operator: getUmatrix4BMUs() at 67%"
## [1] "Operator: getUmatrix4BMUs() at 75%"
## [1] "Operator: getUmatrix4BMUs() at 83%"
## [1] "Operator: getUmatrix4BMUs() at 92%"
## [1] "Operator: getUmatrix4BMUs() at 92%"
## [1] "Calculating Umatrix"
rgl::open3d()
## wgl 
##   1
GeneralizedUmatrix::plotTopographicMap(visualization$Umatrix,visualization$Bestmatches)
## Loading required namespace: matrixStats

You must enable Javascript to view this page properly.

Third Module: Automatic Clustering

The number of Cluster can be derived from dendrogram (PlotIt=TRUE) or from the visualization. In this example, outliers should marked manually in the visualization after the prozess of automatic clustering. Therefore we choose the three main valleys as the number of clusters. TThe function DBSclustering has one parameter to be set. Normally, the Defaul setting StructureType = TRUE works fine. However, for density based structures sometimes StructureType = FALSE of the function ‘DBSclustering’ yields better results. Please verify with the visualiziation or the Dendrogram. For the Dendrogram choose PlotIt=TRUE in the function ‘DBSclustering’.

library(DatabionicSwarm)
library(GeneralizedUmatrix)
Cls=DBSclustering(k=7, Hepta$Data, visualization$Bestmatches, visualization$LC,PlotIt=FALSE)
## Loading required namespace: parallelDist
## 
##      PLEASE NOTE:  The components "delsgs" and "summary" of the
##  object returned by deldir() are now DATA FRAMES rather than
##  matrices (as they were prior to release 0.0-18).
##  See help("deldir").
##  
##      PLEASE NOTE: The process that deldir() uses for determining
##  duplicated points has changed from that used in version
##  0.0-9 of this package (and previously). See help("deldir").
GeneralizedUmatrix::plotTopographicMap(visualization$Umatrix,visualization$Bestmatches,Cls)

You must enable Javascript to view this page properly.

Second Example: Interactive approach

First Module: Projection of high-dimensional Data

2d projection, with instant visualization of annealing steps. DistanceMatrix hast to be defined by the user. In this case it is automatic Euclidiean because the Data itself is the input for ‘Pswarm’.

library(DatabionicSwarm)
data('Lsun3D')
projection=Pswarm(Lsun3D$Data,Cls=Lsun3D$Cls,PlotIt=T,Silent=T)
## [1] "Operator: An approximation of grid size was done."
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |====                                                             |   6%

## 
  |                                                                       
  |=======                                                          |  11%

## 
  |                                                                       
  |===========                                                      |  17%

## 
  |                                                                       
  |==============                                                   |  22%

## 
  |                                                                       
  |==================                                               |  28%

## 
  |                                                                       
  |======================                                           |  33%

## 
  |                                                                       
  |=========================                                        |  39%

## 
  |                                                                       
  |=============================                                    |  44%

## 
  |                                                                       
  |================================                                 |  50%

## 
  |                                                                       
  |====================================                             |  56%

## 
  |                                                                       
  |========================================                         |  61%

## 
  |                                                                       
  |===========================================                      |  67%

## 
  |                                                                       
  |===============================================                  |  72%

## 
  |                                                                       
  |===================================================              |  78%

## 
  |                                                                       
  |======================================================           |  83%

## 
  |                                                                       
  |==========================================================       |  89%

## 
  |                                                                       
  |=============================================================    |  94%

## 
  |                                                                       
  |=================================================================| 100%

You must enable Javascript to view this page properly.

Second Module: Generalized Umatrix

If Non Euclidean Distances are used, Please Use SammonsMapping from the ProjectionBasedClustering package with the correct OutputDimension to generate a new DataMatrix from the distances (see SheppardDiagram or KruskalStress). Here the Generalized Umatrix is calculated using a simplified emergent self-organizing map algorithm. Then the topographic map is visualized based on the information of the Generalized Umatrix.

library(DatabionicSwarm)
library(GeneralizedUmatrix)
visualization=GeneratePswarmVisualization(Data = Lsun3D$Data,projection$ProjectedPoints,projection$LC)
## [1] "Different Points are now on the same grid position. Sum of unique points (before) (after): 403 367"
## [1] "Initializing sESOM algorithm"
## [1] "Operator: getUmatrix4BMUs() at 8%"
## [1] "Operator: getUmatrix4BMUs() at 17%"
## [1] "Operator: getUmatrix4BMUs() at 25%"
## [1] "Operator: getUmatrix4BMUs() at 33%"
## [1] "Operator: getUmatrix4BMUs() at 42%"
## [1] "Operator: getUmatrix4BMUs() at 50%"
## [1] "Operator: getUmatrix4BMUs() at 58%"
## [1] "Operator: getUmatrix4BMUs() at 67%"
## [1] "Operator: getUmatrix4BMUs() at 75%"
## [1] "Operator: getUmatrix4BMUs() at 83%"
## [1] "Operator: getUmatrix4BMUs() at 92%"
## [1] "Operator: getUmatrix4BMUs() at 92%"
## [1] "Calculating Umatrix"
GeneralizedUmatrix::plotTopographicMap(visualization$Umatrix,visualization$Bestmatches)

You must enable Javascript to view this page properly.

Third Module: Interactive Clustering

The number of Cluster can be derived from dendrogram (PlotIt=TRUE) or from the visualization. In this example, outliers should marked manually in the visualization after the prozess of automatic clustering. Therefore we choose the three main valleys as the number of clusters. TThe function DBSclustering has one parameter to be set. Normally, the Defaul setting StructureType = TRUE works fine. However, for density based structures sometimes StructureType = FALSE of the function ‘DBSclustering’ yields better results. Please verify with the visualiziation or the Dendrogram. For the Dendrogram choose PlotIt=TRUE in the function ‘DBSclustering’. Often, it helps to generate first the shape of an island out of the continous topographic map because then you already have the biggest mountains marked as the borders of the visualizations. Then you can improve the clustering by redefining valleys interactivly or marking outliers lying in vulcanos. It is strongly suggested to verify such a clustering externally, e.g. Heatmap or some unsupervised index.

library(DatabionicSwarm)
library(GeneralizedUmatrix)
Cls=DBSclustering(k=3, Lsun3D$Data, visualization$Bestmatches, visualization$LC,PlotIt=FALSE)
GeneralizedUmatrix::plotTopographicMap(visualization$Umatrix,visualization$Bestmatches,Cls)

You must enable Javascript to view this page properly.

Generating the Shape of an Island out of visualization

To generate the 3D landscape in the shape of an island from the toroidal topographic map visualization you may cut your island interactivly around high mountain ranges.Currently, I am unable to show the output in R markdown :-( Please uncomment and try it out yourself.

library(DatabionicSwarm)
library(ProjectionBasedClustering)
## 
## Attaching package: 'ProjectionBasedClustering'
## The following object is masked _by_ '.GlobalEnv':
## 
##     Hepta
## The following object is masked from 'package:DatabionicSwarm':
## 
##     Hepta
library(GeneralizedUmatrix)

#Imx = ProjectionBasedClustering::interactiveGeneralizedUmatrixIsland(visualization$Umatrix,visualization$Bestmatches,Cls)

#GeneralizedUmatrix::plotTopographicMap(visualization$Umatrix,visualization$Bestmatches, Cls=Cls,Imx = Imx)

You must enable Javascript to view this page properly.

Manually Improving the Clustering Using the Visualization

In this example the four outliers can be marked manually with mouse clicks using the shiny interface. Currently, I am unable to show the output in R markdown :-( Please uncomment and try it out yourself.

library(ProjectionBasedClustering)#install if not installed
#Cls2=ProjectionBasedClustering::interactiveClustering(visualization$Umatrix, visualization$Bestmatches, Cls)

You must enable Javascript to view this page properly.

References

[Thrun, 2018] Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, Heidelberg, ISBN: 978-3-658-20539-3, https://doi.org/10.1007/978-3-658-20540-9, 2018.